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1. Introduction 

This report summarizes the activity performed under NASA-Grant NAG3-1097 during 1991. 
During that period, work centered on the following tasks: (1) To investigate more efficient solvers for 
ALGAE, (2) To modify the plotting package for ALGAE, and (3) To validate ALGAE by simulating 
oscillating flow problems similar to those studied by Kurzweg and Ibrahim et. al. [8] - [10]. 

2* New Solution Methods for ALGAE 

This section presents a description of the new direct and iterative solution methods that supple- 
ment the original frontal method in the computer program ALGAE. It also discusses a new iterative 
method for the solution of the discrete energy equations. 

2.1. Bordered Banded Solver 

The bordered banded solver implemented in ALGAE makes use of customized CRAY Y/MP rou- 
tines. A genera] reference for direct solver strategies may be found in, for example, [1]. The ALGAE 
algorithm is basically a block Gaussian elimination strategy which exploits the structure of the dual 
variable system and the Cray library routines in LAPACK, [2]. Also, unlike the frontal solver in 
ALGAE, the bordered banded solver makes no use of disk storage for problems as large as 10,000 flow 
cells. Timing comparisons are included which indicate a speedup factor of 5 to 6 for moderate size 
problems and a factor of 4 to 5 for larger problems when compared with an optimized frontal solver. 
These factors are considerably larger (50 to 200) when the bordered banded solver times are compared 
to those of the origiml (single precision) ALGAE frontal solver. 

2.1.1. Reordering of the equations and dual variables 
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The system that is solved at each time step in ALGAE is composed of equations having two 
separate nonzero structures. Normally an equation in the dual variable system has a zero-nonzero 
structure which is similar to that resulting from the 13 point finite difference stencil approximating the 
biharmonic operator. That is, there are at most 13 nonzero entries in the equation, and the semi- 
bandwidth is generally 2N for an N by N flow cell problem. The second type of equation arises when 
there are obstructions in the flow region or if the flow is pressure driven. In such cases there are equa- 
tions in which there are generally many more nonzero entries and these may occur outside the band of 
width 4N. In terms of network terminology, cycles or countries that are composed of 4 or less links 
give rise to equations that are of the first type, while cycles that circumscribe obstacles or connect pres- 
sure specified boundary segments may give rise to equations of the second type. 

The first modification to the solution strategy in ALGAE is a reordering of the unknown dual 
variables and equations so that the more complex type 2 equations occur last in the system. The result- 
ing system is then bordered banded and can be written as 

B X = K, 

where B is a block matrix of the form 

_ B n B n 
B 2 \ B 2 2 

The matrix B u is banded and very sparse, the dimension of B 22 is usually quite small, and the matrices 
in the border (B 12 , B 21/ and B 22 ) are generally dense. Note that B 22 may be void, i.e., the matrix B on 
occasion may be banded. It is assumed that X =(X Ir X 2 ) and K = (K u K 2 ) are partitioned consistently 
with the partitioning of B . 

2.1.2. Storing nonzero entries 

To eliminate the use of disk I/O during the solution of the dual variable system, the positions 
(/, /) of possible nonzero entries in the dual variable system are calculated once and for all. These posi- 
tions are the same for each time step of the transient. Given these positions, only the nonzero entries of 
the system are calculated. 
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2.1.3. Bordered banded solution algorithm 

The bordered banded algorithm used is contained in the software package BORMAT available 
from the Pittsburgh Supercomputer Center [3]. Note that the BLAS, LAPACK and LINPACK routines 
used in BORMAT are available on the CRAY in SCILIB. (The LAPACK option is used in BORMAT). 

The bordered banded system is factored and the resulting factored system is then solved. The 
process is as follows: 

(1) B u is factored and the L-U factors L\\U n are stored. 

(2) The system 

L\\U ]] Z = -B 12 , 

is solved and B 12 is overwritten with the result. 

(3) Gaussian elimination of the two block rowed system requires the calculation of 

B ' 22 = B 22 - B2\(L n U u ) 1 B\2 = B 22 + B 2]Z 
using the Z in step (2). The result is stored in B 2 2 * 

(4) The new (2,2) block determined in step (3) 

B ' 27 ~ L 22 U 22 f 

is factored and the result stored in the same space. 

(5) The subsystem 

^11^11 Y \ = Ki . 

is solved. 

(6) K 2 is replaced by K 2 - B 2 \ Y 1 . 

(7) The subsystem 

(L 22 L/ 22 ) x 2 = k 2 . 


is solved. 



- 4 - 


(8) The solution X t is calculated as 

X t = Y, + B 21 X 2 . 

2.1.4. Performance Data 

Four data files were used to initially test the modified ALGAE solution options. The first prob- 
lem involved a preheater section geometry, the second dealt with flow through an aircraft compart- 
ment containing obstacles, and the third and fourth were driven cavity geometries. The following are 
the statistics for these problems. NOB denotes the dimension of B 22 - 


Problem 

Flow cells 

Dual variables 

NOB 

Semi-bandwidth 

Preheater 



1 

45 

Aircraft compartment 

1174 

1065 

3 

60 

Small cavity 

2500 

2401 

0 

98 

Large cavity 

10000 

9801 

0 

198 


Three solvers were tested; the original frontal solver, an optimized version of the frontal solver, 
and finally the bordered banded solver described above. The following are timings for a CRAY Y/MP 
using one processor. (This enhances the performance of the bordered banded solver and has very little 
effect on that of the frontal solver.) 

Each problem was run for ten steps. The following table lists the average time in seconds to solve 
the dual variable system for each of the four problems. Due to the excessive cost, the large cavity prob- 
lem timings for the original frontal scheme (designated *) are for a single step. Also included are the 
ratios of the solve times for the optimized frontal solver to those of the bordered banded solver. 


Average Time to Solve One System 

Problem 

Original frontal 

Optimized frontal 

Bordered banded 

Ratio 

Preheater 

10.52 

0.51 

0.05 

9.3 

Aircraft compartment 

16.94 

0.82 

0.09 

8.3 

Small cavity 

38.46 

2.04 

0.32 

6.4 

Large cavity 

191.08* 

14.22 

3.52 

4.0 


The next table contains the CPU charges on the Y/MP for the ten steps of each of the runs. This 
includes setup, solution times and I/O for the entire run. The CPU time (designated **) for the original 
frontal solver was determined by multiplying that for a single step by ten. As above, the ratios of these 
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times for the optimized frontal and bordered banded solvers are also given. Note that there is a one 
time additional cost for the setup of the bordered-banded solver. These ratios would increase if this 
cost was amortized over more steps. 


Total CPU Time Charged 

Problem 

Original frontal 

Optimized frontal 

Bordered banded 

Ratio 

Preheater 

108.29 

7.49 

2.99 

2.5 

Aircraft compartment 

171.68 

10.52 

3.45 

3.0 

Small cavity 

388.26 

24.09 

7.38 

3.3 

Large cavity 

1 969.5** 

154.45 

53.13 

2.9 


In addition to the above data, note that the wall clock times for the optimized frontal method 
were consistently 2 to 3 times greater than those for the bordered banded solver. Presumably this is a 
reflection of the need for disk I/O in the former. Of course, the wall clock time depends on the com- 


puting environment and the level of usage of the machine at any given time. 


2.2. Iterative Solvers 

The iterative solvers that have been incorporated into the ALGAE program are based on a pack- 
age of preconditioned conjugate gradient methods developed by the Mathematical Software Research 
Group at Cray Research, Inc.. A discussion of the package and details of its usage are given in [41 and 
151 . 

The NAMELIST/ START/ variable ISOLV selects the type of solver to be used in ALGAE as fol- 
lows: 


ISOLV = 


0 

1 

2 

3 


optimized frontal method (default) 
bordered-banded direct method 
pure iterative method 
hybrid method 


Note that in addition to the pure iterative solver, a hybrid solver is included that incorporates selected 
calls to the bordered-banded direct solver. The intention is to combine the computational efficiency of 
an iterative solver with the robustness of a direct solver. The strategy controlling the use of these 


solvers is discussed in section 1 .2.3. 
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2 . 2 . 1 . Data Structure 

Let A x = b denote the dual variable system. The sparse column format is used to represent the 
coefficient matrix A. In this format the row indices of the (potentially) nonzero entries of the ;-th 
column are stored in ascending order. The corresponding values are also stored. There is also a 
pointer array containing pointers to the locations of the row index and value of the first nonzero entry 
in each of the columns. 

2.2.2. Dual Variable System Generation 

The dual variable coefficient matrix A is a product A = C T QC, where C is a fundamental matrix 
for the network defined by the finite difference mesh, and Q is the coefficient matrix of the discrete 
momentum equations. Thus, if A = J, C = tc^ m 1 and Q = [^ v l/ then 

&ij = fij vj ■ 

H-v 

Now the data structures of ALGAE are such that the matrix C is defined by the link numbers. There- 
fore, A is generated by determining for each p, v pair the i , j pairs for which a.j receives a contribution 
of the form c^cf^ v c v j, and then accumulating the sums. 

The driver for the iterative solver uses an array to map the contributions to the a,j onto their 
proper locations in the array containing the matrix values. This avoids conditional testing and 
enhances the vectorization process. 

2 . 2 . 3 . Program Control 

For the pure iterative solver, the designated method is used with (or without) user selected 
preconditioning at each time step. If a fatal error is detected during the course of an iteration, or if the 
solution has not satisfied the convergence criterion in the user designated maximum number of itera- 
tions, a message is written and the computations are terminated. 

For the hybrid solver, the solution is computed using the bordered-banded direct solver at the 
first time step and the preconditioning toggle is turned on. For each subsequent time step, an iterative 
solution is attempted. An incomplete LU factorization is computed to precondition the system if the 
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toggle is on, and the toggle is then turned off. Otherwise, the iterative method uses the most recently 
computed incomplete LU factors as the preconditioner. The iterative solution is accepted provided 
that an internal convergence criterion is satisfied and provided that IL4x* - bIL ^ 10^* / lbc*IL, where x* is 
the iterative solution. Otherwise, the system is resolved using the bordered-banded direct solver and 
the preconditioning toggle is turned on. 

2,2.4. Performance Data 

The first problem considered is the preheater problem. This is a modified version of the sample 
problem given in Section 6, Volume 2 of [6]. There are 750 flow cells and 700 dual variables. The tim- 
ings in the following table are based on the average solution time over thirty consecutive time steps 
using a step size of .5 seconds. The original frontal method required 10.52 sec/solve, while the optim- 
ized frontal method took .51 sec/solve. The column headed RATIO gives the ratio of the optimal fron- 
tal solve time to that of the iterative solve time, and the last column gives the maximum pressure drop 
(psi) around the cycles after the final time step. This quantity would be zero for an exact solution, and 
so it may be used as a measure of the accuracy of the iterative solution. 


Method 

ISOLV 

Sec /Solve 

RATIO 

Max. Loop Ap 

CGS 

2 

.15 


iia 


3 

.17 

SB 


GMR 

2 

.14 

3.6 



3 

.19 

2.7 



The second set of sample timings corresponds to a large Poiseuille-type problem with heat added 
to the central portion of the channel. There are 10000 flow cells and 9900 dual variables. The sample 
times of the following table are based on the average solution time over 10 consecutive time steps 
using a step size of 1.0. The original and optimized frontal methods required 193.5 and 15.3 sec/solve. 
The conventions used in describing the previous table again apply. 


Method 

ISOLV 

Sec/Solve 

RATIO 

Max. Loop Ap 

CGS 

2 

3.42 

4.5 

.12x10^ 


3 

4.34 

3.5 

.14xl0 -g 

GMR 

2 

3.04 

5.0 

.48x1 0 -6 


3 

4.83 

3.2 

.14xl0~ g 
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23. Energy Equation 

The original point Causs-Seidel method used in ALGAE to solve the discrete energy equations 
has been replaced by the point Jacobi method. For model problems the asymptotic rate of convergence 
of the Jacobi method is not as large as that of the Gauss-Seidel method. (See, for example, (7|.) How- 
ever, it is amenable to vectorization, whereas the Gauss-Seidel method is not. 

Sample timings were made on the two problems used to test the iterative solvers. The following 
table shows that for these problems the new method resulted in an average speed-up factor (RATIO) 
that is slightly greater than three. 


Problem 

No. Time Steps 

Avg. Solve Times (Sec.) 

RATIO 


Gauss-Seidel 

Jacobi 


Preheater 

30 

.044 


3.4 

Heated Channel 

10 

.328 

.108 

3.04 


3. Modification of Plotting Package for ALGAE 

The original plotting package for ALGAE assumed that all walls were adiabatic. This was also 
the assumption made in the fluid modeling by ALGAE, and was reasonable for thermal transfer in 
reactor components. The ability to allow for conduction from solid walls was added to the code 
ALGAE some years ago, but the plotting package was not changed to adequately display temperature 
distributions near solid walls. As such, some thermal contours, while completely accurate one flow 
cell from the wall, would terminate orthogonal to the wall. The plotting package has been updated 
this year and now displays the correct contour levels up to and including the walls. 

4. Oscillating Flow in a Regenerator Component 

As part of the validation of ALGAE for Stirling Engine simulations, we consider a model of a 
parallel plate regenerator similar to that studied by Ibrahim et. al. [8], 19]. Figure 1 shows a schematic 
of the geometry and lists the assumed physical properties of the working gas (Helium) and wire 1 . 
Also shown are the boundary conditions and mesh spacings used by ALGAE. Note that the angular 
frequency and amplitude of the oscillating pressure drop forcing function are 412.5 sec -1 and .0349 psi 


1 These were suggested by M. Ibrahim in a private communication, dated Dec 11, 1991. 
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p(t) 


T =436.50 °F 


Free slip adiabatic wall 

R i = radius of channel = .2 1 8667 E-03 ft 
R 2 = radius of wire = .41666 E-04 ft 
L = length of channel = .3 E-02 ft 

Ay =. 13975 E-04, Ay 0 = -57545 E-05, Ay 3 =.38363 E-05, Ay 4 = .10417 E-04 

Ax = .17328 E-04 ft 


Helium Properties: 

Molecular weight : 4.002 

Specific Heat: 1 .24 BTU/lbm °F 

Thermal conductivity: 3.53 E-05 BTU/sec ft °F 

Viscosity: 0.1 85 E-041bm/ft sec 

Density: 0.93767 lbm/ft3 


Steel Properties: 

Specific Heat: 0.1 1 1 7 BTU/lbm °F 

Thermal conductivity: 0.21 5 E-02 BTU/sec ft °F 

Density: 51 4.28 Ibm/ft 5 


P(t)= 2178.073 + 0.0349 Cos ( s / 2 + 412.5t ) 

Figure 1. Parallel plate regenerator. 

respectively. This corresponds to a Valensi number, Va, of 1 and maximum Reynolds number, Re^a*, 
of 262, and facilitates comparison of the present results with those of [9|, Figures 1 and 2, where these 
quantities were 1 and 200. Recall that Va and Re^^ are defined as follows: 


,, t0D|, 2 I'maxDfc 

1/® ' . / R^max — 

4v V 


where to is the frequency, D h the hydraulic diameter, v the kinematic viscosity, and « max the maximum 
over one half cycle of the average axial velocity. 
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Unlike the one dimensional numerical methods reported in (81 and (9| ( ALGAE solves the full 
(unsteady) two dimensional Navier-Stokes and heat convection-conduction equations. Thus, radial 
convection, conduction, and momentum effects are directly accounted for. 

Figures 2 and 3 show axial Velocity and Temperature profiles as a function of the radial distance 
measured from the centerline of the channel. The 12 curves shown on each figure correspond to incre- 
ments of 30' in the crank angle as indicated by the accompanying key. These plots were made after a 
start up transient of about 1.25 sec. They should be compared with Figures 1 and 2 of [9| which give 
analogous plots of these quantities for the numerical model used in that work. 

As shown in Figure 2, the time dependent behavior of the channel centerline axial velocity is 
cyclic with an amplitude of about 18 ft/sec. This should be compared with the analogous result of (91, 
Figure 1, where the amplitude is =13 ft/sec. The temperature variation of the Helium given by Figure 
3 is approximately 49 F about a temperature of 406 F, whereas that of Figure 2 in (9) is -31 F about 
408 °F. On the other hand the temperature variation in the wire shown in Figure 3 is about 2 F, while 
that of Figure 2 in [9| is =14 F. Since the main mechanism for heat transfer in the Helium is convec- 
tion, the 49 F variation in the gas would presumably be reduced if Re^ were reduced from 262 to 
200. This will probably also reduce the variation in the wire. The two dimensional calculations do not 
yield as rapid a response of the temperature at the gas-wire interface as those of [9]. This may be due 
to a poor resolution of the boundary layer because of the coarseness of the spatial mesh near the inter- 


face. 



Velocity (ft/sec) 


- 11 - 



Symbol 


KEY 

Crank Angle (deg) 

210, 30 
180, 0 
240, 60 
150. 330 
270, 90 
120, 300 


<£ Value (ft/sec) 

17.9, -17.7 
17.21. -16.94 

13.9, -13.74 
12.13. -11.69 
6.25. -6.09 
3.18, -3.21 


Figure 2. Axial velocity profiles 




Radial Distance (ft) 


KEY 

Crank Angle (deg) 

210 , 30 
180 , 0 
240 , 60 
150 , 330 
270 , 90 
120 , 300 


<£ Value (deg F) 
400.51, 413.09 
411.31. 402.33 
385.83, 428.5 
420.40. 393.16 
382.28. 431.29 
428.34, 385.12 


Figure 3. Temperature profiles 
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